A Ranking Approach to Stress Prediction for Letter-to-Phoneme Conversion
نویسندگان
چکیده
Correct stress placement is important in text-to-speech systems, in terms of both the overall accuracy and the naturalness of pronunciation. In this paper, we formulate stress assignment as a sequence prediction problem. We represent words as sequences of substrings, and use the substrings as features in a Support Vector Machine (SVM) ranker, which is trained to rank possible stress patterns. The ranking approach facilitates inclusion of arbitrary features over both the input sequence and output stress pattern. Our system advances the current state-of-the-art, predicting primary stress in English, German, and Dutch with up to 98% word accuracy on phonemes, and 96% on letters. The system is also highly accurate in predicting secondary stress. Finally, when applied in tandem with an L2P system, it substantially reduces the word error rate when predicting both phonemes and stress.
منابع مشابه
A Multi-Strategy Approach to Improving Pronunciation by Analogy
Pronunciation by analogy (PbA) is a data-driven method for relating letters to sound, with potential application to next-generation text-to-speech systems. This paper extends previous work on PbA in several directions. First, we have included "full" pattern matching between input letter string and dictionary entries, as well as including lexical stress in letter-to-phoneme conversion. Second, w...
متن کاملLetter-to-Phoneme Conversion for a German Text-to-Speech System
This thesis deals with the conversion from letters to phonemes, syllabification and word stress assignment for a German text-to-speech system. In the first part of the thesis (chapter 5), several alternative approaches for morphological segmentation are analysed and the benefit of such a morphological preprocessing component is evaluated with respect to the grapheme-to-phoneme conversion algori...
متن کاملAn evaluation of non-standard features for grapheme-to-phoneme conversion
Machine learning methods for grapheme-to-phoneme (G2P) conversion are popular, but the features used in the literature are most often simply a window of context letters, despite the availability of other features. In this paper, a set of features beyond the seven-letter window, termed non-standard features, are systematically evaluated for American English, using decision trees. The results sho...
متن کاملApplying Many-to-Many Alignments and Hidden Markov Models to Letter-to-Phoneme Conversion
Letter-to-phoneme conversion generally requires aligned training data of letters and phonemes. Typically, the alignments are limited to one-to-one alignments. We present a novel technique of training with many-to-many alignments. A letter chunking bigram prediction manages double letters and double phonemes automatically as opposed to preprocessing with fixed lists. We also apply an HMM method ...
متن کاملJoint Processing and Discriminative Training for Letter-to-Phoneme Conversion
We present a discriminative structureprediction model for the letter-to-phoneme task, a crucial step in text-to-speech processing. Our method encompasses three tasks that have been previously handled separately: input segmentation, phoneme prediction, and sequence modeling. The key idea is online discriminative training, which updates parameters according to a comparison of the current system o...
متن کامل